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Abstract 



Estimation of parameters of random effects models from samples collected via complex 
multistage designs is considered. One way to reduce estimation bias due to unequal probabilities 
of selection is to incorporate sampling weights. Many researchers have been proposed various 
weighting methods (Kom, & Graubard, 2003; Pfeffermann, Skinner, Holmes, Goldstein, & 
Rasbash, 1998) in estimating the parameters of hierarchical models, including random effects 
models as a special case. In this paper, the bias of the weighted analysis of variance (ANOVA) 
estimators of the variance components for a two-level, one-way random effects model is 
evaluated. For these estimators, analytic bias expressions are first developed, the expressions are 
then used to examine the impact of sample size, intraclass correlation coefficient (ICC), and the 
sampling design on the bias of the estimators. In addition, two-stage sampling designs are 
considered, with a general probability design at the first stage (Level 2) and simple random 
sampling without replacement (SRS) at the second stage (Level 1). The study shows that first- 
order weighted variance component estimators perform well when for moderate cluster sizes and 
ICC values. However, noticeable estimation bias can be found with this weighting method for 
small cluster sizes (less than 20), particularly when ICC is small (less than 0.2). In such 
scenarios, scaled first-order weighted estimators can be an alternative. This paper is discussed in 
the context of National Assessment of Educational Progress (NAEP) 2003 4th Grade Reading 
National and State Assessment data, with Level 1 being the student level and Level 2 being the 
school level. 

Key words: random effects model, variance components, estimation bias, ANOVA estimators, 
complex sampling designs, selection probability, sampling weights, ICC, NAEP 
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1. Introduction 



The National Assessment of Educational Progress (NAEP) is a large-scale educational 
assessment designed to give information on what U.S. students know and can do. Data for the 
NAEP are collected from a complex multistage sample of schools and students, therefore 
sampling weights are required for proper analysis of these data. Online documentation from the 
National Center for Education Statistics (NCES) provides secondary data analysts with 
information on how to use weights on the NAEP data file when estimating means, population 
totals, and regression coefficients but nothing on how to use weights when fitting hierarchical 
models. Because these models are increasingly popular in educational research and several 
different weighting methods have been proposed for estimating the model parameters, guidance 
for data analysts is needed. The motivation for the research reported here was to offer such 
guidance for secondary analysts of NAEP data. 

Pfeffermann, Skinner, Holmes, Goldstein, and Rasbash (1998) and Graubard and Kom 
(1996) presented two methods for incorporating sampling weights in estimation of hierarchical 
models. The former used only first-order weights and the latter used both first- and second-order 
weights. First-order weights are (before adjustments for nonsampling errors) reciprocals of the 
inclusion probabilities of sampling units, while second-order weights are reciprocals of the joint 
inclusion probabilities of pairs of units. Estimates for parameters of hierarchical models that use 
only first-order weights are currently available in commercial software (e.g., HLM 6.0, MLWIN, 
LISREL, and Stata GLLAMM), but those using second-order weights are not available. Further, 
second-order weights are not typically provided on data files, so users have to produce them 
from knowledge of the sampling design, which is difficult for all but the most expert users. 

Estimators that are linear in the data (such as estimators of totals) are design-unbiased if 
they incorporate the appropriate first-order weights. However, weighting might not reduce 
design bias for those that are nonlinear in the data (such as estimators of variance components). 
In fact, Kom and Graubard (2003) noted that estimators of variance components that used only 
first-order weights could be substantially biased, even for designs with simple random sampling 
without replacement (SRS) at each stage. The goal of the current study is to determine when 
first-order weighted estimators of variance components are adequate and when they are not by 
focusing on data and designs related to those found in NAEP. 
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Section 2 reviews the background of sampling weights and hierarchical models. Section 3 
presents analytical expressions for bias of the first-order weighted ANOVA estimators under the 
random effects model. Section 4 characterizes the conditions under which the first-order 
weighted estimators studied in section 3 have an unacceptably high bias. In section 5, first- and 
second-order weighted ANOVA estimators are computed for a random effects model fit to the 
NAEP 2003 fourth-grade reading data. First-order weighted estimators adjusted by scaling are 
evaluated in section 6 . Finally, a summary and recommendations for users of NAEP data follows 
in section 7. 



2. Hierarchical Models and Sampling Weights 

When the purpose of an educational assessment program is to make valid inferences from 
a sample to a population of students, the students must be chosen according to a probability 
design; that is, the probability of selection of each sampled student must be known. Sampling 
designs for educational assessments often have a two-stage structure because it is cost-efficient 
to test groups of students from the same school. The selection probabilities for different schools 
and different students within a school may be unequal, and if they are, the estimation procedure 
must take this into account by weighting in order to assure approximately design unbiased 
estimation. One estimator that is design unbiased for the total for any probability design is the 
Horvitz-Thompson (H-T) estimator. It weights each student’s score by the inverse of his or her 
selection probability and can be written for the two-stage design as 

where k is the number of schools in the sample, m i is the number of students sampled from each 
selected school, y is is the score of the 5 th student in the zth school, n t = P( school i in sample) , 
and n Ai = Pfstudent 5 in sample I school i in sample) . The first-order weights, defined as 
w ( . = H 7T i and w sli = 1 / K sU , are needed to prevent bias if the design is informative; that is, if the 

model that holds for the sample is different from the model for the population (Pfeffermann & 
Smith, 1985). See Binder, Kovacevic, and Roberts (2005) and Binder and Roberts (2001) for 
more detailed discussion on the informativeness of the sampling design. 

For assessments such as NAEP, which collect a rich amount of background information, 
educational researchers may also be interested in fitting models designed to examine 
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relationships between a student’s performance and his or her personal or school characteristics. 
Because of the multistage sampling design, models accommodating the hierarchical structure are 
more appropriate for analysis. A simple hierarchical model (Raudenbush & Bryk, 2002) having 
two levels can be written as 

Level 1 : y b = fi 0i + x is + e is , ( 1 ) 

Level 2: /3 0i = Too + Toi^ + a 0i , 

P\, = 7io + 7nL +a u > 



for i = 1 and s = 1, m„ where x is are covariates corresponding to the student, z, are 
covariates corresponding to the school, (5 = [/?„,. , j8 u ] T is a vector of unknown regression 
parameters, and a ; = [a 0i ,a u ] T and e is are random effects, which are mutually independent and 
normally distributed with zero means and constant variances, Var[a i ) = Q and Var(e is ) = o] . 

This paper considered a simple special case of this model, the one-way random effects 
model, in which f3 0l = // was the grand mean and f3 n = 0. Thus our model is 

y, s = M +a i+ e is , (2) 



for i = 1 and s = 1 , where a ~ N ( 0, <j ] ) and £ is ~ ( 0. o] j , and a i and £ is are all 

mutually independent. Besides estimating the mean, or the variance components themselves, 
researchers may also be interested in estimating the intraclass correlation coefficient (ICC), 



ICC = 



^-2 , ^-2 
< T + <7 , 



(3) 



which is the proportion of total variability in scores due to the school-to-school differences. 

Kom and Graubard (2003) showed in a simulation study that the estimators of variance 
components that used only first-order weights were biased, even when the design was 
noninformative at both school and student levels. Their proposed estimators, which used the 
second-order weights, were nearly unbiased. 

Second-order weights are needed for an approximately unbiased estimation of variance 
components because the full-population functions of the data being estimated are nonlinear, 
specifically involving squares of sums of the individual scores. However, the estimation method 
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incorporating second-order weights is difficult to employ in practice, both because no 
commercial software is yet available and because second-order weights are not routinely 
included on data files. 

The next section develops analytical expressions for the bias of Graubard and Korn’s 
first-order weighted estimators of the variance components (Graubard & Kom, 1996) for the 
one-way random effects model. This process allows examination of the estimation bias for a 
larger range of sampling designs and population scenarios than simulation does. Most of the 
available commercial multilevel software packages use maximum likelihood based estimation 
methods (Chantala & Suchindran, 2006). However, any theoretical evaluation of the weighted 
estimators becomes rapidly intractable when the computation involves iterative methods and 
complex sampling structures. The focus of this paper is the analysis of variance (ANOVA) 
estimators (Searle, Casella, & McCulloch, 1992, p. 59), also known as method of moments 
estimators (Korn & Graubard, 2003) because they are easier to examine analytically. 



3. Bias of First-Order Weighted Analysis of Variance 
(ANOVA) Estimators 

3.1 First-Order Weighted ANOVA Estimators 

In a super-population view (Binder & Roberts, 2001), it is assumed that the data in a 
population have arisen from Equation 2 and we are interested in estimating its parameters // , 

cr , and <j] . If all students from all schools in the population are observed, the parameters fu , 

cr, and cr in Equation 2 can be estimated by (Searle et al., 1992): 



y K y y 

Y LmUs = 1 ‘ s 



s: = 



z>,-o 



z:,rA-c 2 , 



1 K _ _ / — — \ 2 ,SV 



(K -l)M { 



-rM-ry- 



M n 



(4) 

(5) 

(6) 



where K is the total number of schools in the population, AT is the total number of students 
within each school, Y t is the ith school average, Y is the overall average, and 
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M 0 = 



K - 1 



y A ' m . 

/- Ji=\ ' 



y M i 

/—!i = 1 ' 



y A ' m; 



( 7 ) 



Equations 4 to 6 are model consistent for the parameter values. Of course, access to data from all 
students in the population is usually not available. Instead, the parameters in Equation 2 must be 
estimated from a sample. If a sample from a two-stage probability sampling design of students 
chosen within schools is available, and if the sample units have equal selection probabilities at 
each of the two stages, then estimators of these expressions can be obtained by replacing the 
sums over all population units with the analogous sums over all sample units in Equations 4 to 7. 
But this estimation method can lead to biased results even asymptotically if either the students or 
the schools are unequally weighted (see Jia, 2007, for detailed discussion). 

Graubard and Kom (1996) suggested the first-order weighted ANOVA estimators: 



Z k m i 

i=\ L-Us=\ 



w i w sV y is 



i 



J eFW 



1 



Z k y 1 m i 

Z k m i / \ 2 

i = 1 w il^ s=i w s\i{yis-yi.Fw) 



S aFW 



m. 



OFW 



— -rZti w ‘(Zr=iyi<) (w 



■ y..Fw ) 2 



J eFW 



m, 



OFW 



( 8 ) 



( 9 ) 



(10) 



where 



m, 



OFW 



y w — i 

/—i /— i 1 



Y^ k y -1 n 

Z^Z, 



/*■- 



Y^ k y^ m i 
> W- > 

< ^ i = 1 1 ( ^ .9=1 



w, 



-z:az>,) ! 



s\i 



Z m i 

J’i.FW 






Z ttlj 

,=1 W * 



These statistics estimate // , a), and cr by replacing all population sums in Equations 4 to 7 
with weighted sample sums. The weighted estimator y FW is (for fixed sample sizes) unbiased 
for ju, but s] FW and s] FW require large sample sizes at both levels of the design for approximate 
unbiasedness for <y 2 e and <y 2 a . The sample size within the school is often not large, so there can 
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be substantial bias in the estimators. In the next subsection, expressions for their approximate 
biases are derived. 



3.2 Bias Expressions for the First-Order Weighted AN OVA Estimators 

Expressions of the approximate estimation bias for fairly general sample designs were 
developed to evaluate the performance of s 2 eFW and s 2 aFW . The designs considered were two- 
stage, with a general probability design at the school level and SRS at the student level, which 
are common in educational surveys, including NAEP. The school level selection probability n i 

was allowed to be related to both the school level random effect a i and the school population 
size AT . Then n i = 7r(M j ,a i ) , so that n i was also a random variable in this framework. 

The expectation of the estimators was approximated by taking the expectation of the first 
term of their Taylor expansion, first with respect to the sampling design and then to the model 
(see the appendix). This yielded an approximate relative bias for s 2 FW of 



RB 



I,a,e ( S eFw') 



E I,a.e ( S eFW ) ~ G ] Yji= i M ) ” K _ aVg(M / III) - 1 



N-K 



M - 1 



( 11 ) 



where N = AT , M = N / K , and avg(M / m ) = (1/ . / m j . Equation 11 shows 

that s 2 eFW was negatively biased, with larger relative bias for small school sample size (unless Mj 

is also small) and bounded below by -1. A complex design at the school level did not affect its 
approximate relative bias. 

The bias and relative bias of s 2 aFW were approximated using similar methods (see the 
appendix). The resulting bias expression (A20) was too complicated to be helpful for drawing 
general conclusions, so a simpler balanced case was considered in which AT ; = M and m j = m 
for all i. Then 



RB I,a,e{ S aFw) 



1 1-/CC 



K-E(w i ) m-l'] E(w i )~ 1 



v K - 1 M-lj 



m ICC 
-p(^ ij w i w j ,a i a j )sd{7r ij w i w j ) 



K - 1 

p(w i ,a 2 )sd(w i ) 

~K^l 



( 12 ) 
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where E{ ) and p{ ) were defined as the expectation and the correlation of the random 

variables with respect to the school level random effect a i . 

Note that if the schools were censused, all terms but the first in Equation 12 would have 
been equal to zero and the bias would have been positive unless the students were also censused 
(m = M). The relative bias in this case could have been large if the ICC and m were both small. 

The second term, 

K - 1 ’ 

was negative for a given sample but can be substantial only if a small proportion of schools in the 
population are selected in the sample. The next two terms were related to the informativeness of 
the sample. The third term rarely made an important contribution to the relative bias unless for 
designs where n tj is considerably different from n i n j . for example, if a small school level 

sampling rate was present. Otherwise, lt { . ~ 7T7ZV =1/ w w . . If extreme schools (those with either 
high or low scores) were oversampled, then the last term in Equation 12, 

p(w i ,af)sd(w i ) 



would have contributed a positive component to the relative bias. 

Since the bias expressions reported in this section are approximations, a simulation study 
was conducted to check how accurate they were in reflecting the true bias of the estimators. In 
the simulation, we assumed a population of K = 1,500 schools, each of size M = 56 students 
(which was the estimated average population size of schools in the NAEP 2003 fourth-grade 
reading sample). A two-stage stratified design was selected with two strata at the school level 
and SRS at the student level. Three experimental factors (denoted as Factors A, B, and C) were 
considered. Factor A varied the nature of the informativeness of the stratification design: Level 
A/ indicated oversampling schools with large values of |a ; | (extreme schools, symmetric strata), 

and Level A? indicated oversampling schools with large values of a i (high-performing schools, 
asymmetric strata). Factor B denoted the sample size assignment at the school level. Defining 
Stratum 1 as the oversampled stratum and Stratum 2 the remainder, Level B { denoted selecting 



7 




all the units from Stratum 1 and half of units from Stratum 2(k { = Aj;fc 2 =K 2 /2) and Level B, 

denoted selecting 90 schools from Stratum 1 and nine schools from Stratum 2(k l - 90; k 2 - 9) . 

Factor C was the student-level sample size, with Ci denoting a large sample (m = 23, which was 
the average school sample size for the NAEP 2003 fourth-grade reading sample) and C 2 denoting 
a small sample (m = 5). 

The population data ( K = 1,500, M = 56 for all schools) was simulated using Equation 2, 
with (T e = 1 and ICC = 0.23. Then 5,000 samples were simulated from the data for each of the 

2x2x2 = 8 conditions just described. The first-order weighted estimators s 2 eFW and s 2 aFW from 

Equations 9 and 10 were computed for each sample, the bias for each estimator was computed by 
averaging the estimates, and the relative bias was computed. The results are reported in Table 1. 
Note that <j 2 a = ICC ■ a) , and for a given ICC value, further simulation results suggest that for 

any given o] value, the relative biases of s] FW and s 2 aFW were almost identical to the ones 

presented in Table 1, and the differences were mostly due to the simulation error. Expressions 
for relative bias were then computed from Equations 1 1 and 12 for each of the eight designs. The 
table shows that the simulated and analytically derived approximate biases are very similar in all 
cases considered. Based on this result, the analytic expressions were used to investigate the 
conditions under which the bias of the first-order weighted estimators of variance components 
would be problematic. 

4. Examination of Bias of the First-Order Variance and Weighted Analysis of 

Variance (ANOVA) Estimators 

The bias expressions derived in section 3 provided a systematic way to examine 
estimation bias for a variety of models and sampling designs. Equations 11 and 12 show that the 
relative bias of the first-order weighted estimators of the variance components was affected by 
sample sizes, sampling rates, ICC, and the informativeness of the design. This section uses these 
expressions to examine how much these factors affect the bias and to determine how important 
that bias is. The examples of the previous section and its results in Table 1 show that the relative 
bias of the variance components estimators could vary tremendously and that cases could exist at 
both extremes; that is, when the effect on bias was negligible (as in the upper half of Table 1) 
and when it was unacceptably high (as in the lower half of Table 1). 
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Table 1 



Comparison of Simulated and Approximate Relative Bias (RB) of First-Order Weighted 
Estimators From a One-Way Random Effects Model With Informative Designs 







A1 (symmetric strata) 


A2 (asymmetric strata) 




ms 2 j 


RB(sl) 


msi ) 








Cl (m = 


23) 




B1 


Simulated 


-2.6% 


8.7% 


-2.6% 


8.8% 




Analytic 


-2.6% 


8.7% 


-2.6% 


8.8% 


B2 


Simulated 


-2.6% 


2.4% 


-2.6% 


8.1% 




Analytic 


-2.6% 


3.2% 


-2.6% 


7.3% 








C 2 (m = 


: 5) 




B1 


Simulated 


-18.5% 


62.1% 


-18.6% 


62.2% 




Analytic 


-18.6% 


62.3% 


-18.6% 


62.3% 


B2 


Simulated 


-18.8% 


55.2% 


-18.8% 


59.2% 




Analytic 


-18.6% 


55.2% 


-18.6% 


59.2% 



Note. Simulation results are based on 5,000 iterations. Analytic results were calculated from 
Equations 11 and 12. 



The goal in this section is to characterize the situations in which the first-order weighted 
estimators of variance components are adequate and when they are not. This was done by 
systematically varying features of the model parameters and sampling design and using the 
analytic expressions of bias for evaluation. 



4.1 Effect of Sample Size Under Balanced Noninformative Designs 

Section 3 noted that the first-order weighted estimators of the variance components could 
be substantially biased even if the sampling design was noninformative. In the first example, the 
bias in the first-order weighted estimator of the between- and within-school variance components 
was examined. The simple case of a single-stage sample from a population of equal-sized 
schools was assumed; that is, all schools and a simple random sample of m students within each 
school were selected. From Equations 11 and 12, 



RBraMrw) = 



M — m 
(. M — 1 )m 



(13) 
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(14) 



RB I.a,e( S aFW ) 



M-m 1 -ICC 
(M-l)m ICC 



Figure 1 shows these relative biases for a range of school population sizes (M) and school sample 
sizes (m) when ICC = 0.2. If a relative bias of 10% or greater in magnitude was considered to be 
unacceptably large, then s 2 eFW had too large of a bias if m < 10 for M ranging from about 40 to 

140. The estimator s 2 aFW also required larger values of m to have an acceptably small bias. For 
example, m needed to be at least 20 when M = 40 and at least 30 when M = 100. 



4.2 Effect of Varying Population and Sample Sizes Under Unbalanced Noninformative Design 
The second example was designed to examine whether varying school population sizes or 
varying school sample sizes affected the bias of the first-order weighted variance component 
estimators. It was assumed that the school population size AT followed a specified distribution. 

It was also assumed that all schools and a simple random sample of m ; students per school were 
selected. Equation A20 (see the appendix) could then be simplified to 



RBuaMrw) 



y K M _y K ML 

m, 1 ] . 



ICC 



Y' MM. 

( J & 7=1 1 J 



ICC 






M,.(m,-1) 



m 



i ) 



Y K M. 

Z— (/=! 1 ^ 



ICC 



,cc 



(15) 



Again ICC = 0.2 as in the first example. In order to examine a realistic range of distributions of 
school population size, we first fitted a gamma distribution to the empirical distribution of estimated 
school population sizes from the NAEP 2003 fourth-grade reading assessment by matching the first 
two moments (M wejghted = 56, S wei hted (M ) = 44). The corresponding coefficient of variation (CV) is 

0.78. Figure 2 plots the histogram of the estimated school population size along with the gamma 
density approximation. Then K (= 1,500) units was generated from that gamma distribution. To have 
varying school sample sizes, m. =M i /2 was set. In addition, cases were considered for which the 
school population sizes were generated from three other gamma distributions with approximately the 
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same mean value (= 56) but varying CVs, both smaller and larger than those observed in the NAEP 
data. The corresponding histograms are displayed in Figure 3. 



M= 40 




m 



M=80 




m 



M=120 




m 



M=60 




m 



M=100 




m 



M=140 




m 



Figure 1. Relative bias of first-order weighted variance estimators as a function of school 
population and sample sizes for a noninformative design in which all schools are sampled 
and a simple sample of m students are selected within each school. 

Note. The dashed lines are the bench marks for -10% and 10% relative bias ("-relative bias of 
the estimators of the between-school variance; - relative bias of the estimator of the within- 
school variance.) M = school population size; m = school sample size. 
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Figure 2. Histogram of the estimated school population size for National Assessment of 
Educational Progress (NAEP) 2003 fourth-grade national assessment. 

Note. M = estimated school population size. 




0 50 100 200 300 0 50 100 200 300 




0 50 100 200 300 0 50 100 200 300 



M M 

Figure 3. Histogram of the simulated school population size. 

Note. The distributions from which the finite population of school were generated from top left 
to the bottom right: gamma(0. 25,0. 004), gamma(l, 0.018), gamma( 1.70,0.030), and gamma(25, 
0.448). M = school population size. 
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Table 2 shows the relative biases computed from Equations 12 and 15. Note that gamma 
(1.70, 0.030) was chosen to approximate the school population size distribution for the above 
given NAEP assessment. It can be seen that the estimators underestimated the within-school 
variability and overestimated the between-school variability, as in the equal school size case. In 
addition, even though the CV of the school sizes varied from 0.2 to 2.0, the relative biases 
calculated were all similar to the one with the constant school population size of 56 
{KB, ae (s 2 eFW ) = - 1.8% and RB r ae (s 2 aFW ) - 7.3% ). The result suggested that varying school 
population sizes and varying school sample sizes did not have a substantial effect on the relative 
bias of s 2 eFW and s 2 aFW . 

Table 2 

Relative Bias (RB) of the First-Order Weighted Estimators of Within-School and Between- 
School Variance Components for Variable School Population Size and School Sample Size 



Model 


CV(M ) 


RB ,.a.e (*eFW ) 


RB I,a,e ( S IfW ) 


Gamma(0. 25, 0.004) 


2 


-1.9% 


7.6% 


Gamma( 1.00,0.0 18) 


1 


-1.8% 


7.1% 


Gamma( 1.70,0.030) 


0.78 


-1.8% 


7.2% 


Gamma(25, 0.448) 


0.2 


-1.8% 


7.3% 



Note. The RB s for comparable constant school sample size cases for within-school and between- 
school variance components are -1.8% and 7.3%, respectively. CV = coefficient of variation; 

M = school population size. 



4.3 Joint Effect of School Sample Sizes and Interclass Correlation Coefficient (ICC) Level 
The joint effect of the school sample sizes and ICC on the bias of the estimators of the 
between-school variance component was examined next. Kovacevic and Rai (2003) observed 
from a simulation study that the relative bias of their proposed weighted estimators increased as 
the ICC level decreased. Similar results were found in the simulation study conducted by 
Asparouhov (2006). The analytic bias expression and Table 1 show that the effect of ICC on 
RBj ae (s 2 aFW ) was mitigated by large school sample size (m). This example looked systematically 
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at the joint effect of these factors for both informative and noninformative designs. The analysis 
was restricted to the equal school and sample size case for simplicity. 

In this example, the number of schools in the population was fixed as 1,500, and the 
population was assumed to follow the model in Equation 2. Four different school level designs 
were considered. The first three were informative and the last was noninformative (SRS at the 
school level). The three informative designs were all stratified, with strata defined by varying 
cut-points on the school random effect. In a real application, the stratification design would 
likely be less informative than these, so in some sense, this example was the worst case. Design 1 
oversampled high-performing schools (that is, a school belonged to Stratum 1 if a t > <7 a and to 

Stratum 2 otherwise); Design 2 oversampled above-average schools (strata defined by cl > 0 

and cl < 0 ); and Design 3 oversampled extreme -performing schools (strata defined by 

\a i | > 0.6745 ■ (7 a and | a i | < 0.6745 ■ a a ). Design 4 selected schools by SRS. For the first three 

designs, 90 schools were sampled from the oversampled stratum and nine from the other one; 99 
schools were selected for the fourth design. At the student level, a sample was randomly selected 
without replacement from each selected school. The school population size was 56, and the 
school sample sizes ranged from 5 to 30. We investigated bias for ICC from 0.05 to 0.30. 

The relative biases of s 2 aFW were calculated using Equation 12, where vv, and tt were all 

functions of normally distributed random variable cl . Figure 4 plots RB t ae (s 2 aFW ) as a function 

of ICC and m under the four given designs. The trends were similar for the four designs, showing 
that the relative bias increased as ICC decreased and as school sample size decreased. A design 
having small school sample sizes could make the relative bias unacceptable. The informative 
designs showed similar magnitudes of bias as the noninformative design, so it appeared that the 
relative bias of the first-order weighted estimators of the between-school variance components 
was mainly due to the school sample size and ICC effect. 

4.4. Summary 

The purpose of this section was to examine whether the first-order weighted estimators had 
an acceptably small bias for estimation of variance components in the random effects model. Our 
examples showed that the first-order weighted variance components estimators were biased under 
both informative and noninformative designs. However, the degree of informativeness of the 
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school sampling design was not the main factor contributing to the bias. The first-order weights 
appeared to remove most of the bias due to this source. Rather, the relative bias was large when the 
ICC and school sample size were both small. In any particular case, when a data analyst has an 
idea about the size of ICC, m, and M. he can investigate the magnitude of the relative bias by using 
the simplified expressions in Equations 13 and 14 when K is relatively large. 



Oversample High-performance Schools 




0.05 0.10 0.15 0.20 0.25 0.30 



ICC Level 
Design I 

Undersample extreme performing Schools 




0.05 0.10 0.15 0.20 0.25 0.30 



ICC Level 
Desian III 



Oversample Above Average Schools 




0.05 0.10 0.15 0.20 0.25 0.30 

ICC Level 
Desgin II 



Simple Random Sampling of schools 




0.05 0.10 0.15 0.20 0.25 0.30 



ICC Level 
Desian IV 



Figure 4. Effect of interclass correlation coefficient (ICC), school sample size (in), and 
sampling design on the magnitude of the relative bias of the first-order weighted estimator 
of the between-school variance component. 
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5. Application — National Assessment of Educational Progress (NAEP) 

2003 Fourth-Grade Reading Assessment 

In the previous section, we examined the size of the bias of the first-order weighted 
estimators of variance components in the random effects model for a variety of parameter 
settings and design features. In this section, we calculate first-order and second-order weighted 
estimates (Korn & Graubard, 2003) of the variance components from a random effects model 
fitted to the NAEP 2003 fourth-grade reading assessment data for the nation as a whole and for 
two jurisdictions. Although the true values of the variance components weren’t known, it was 
known that the second-order weighted estimators were approximately unbiased (Korn & 
Graubard, 2003). Hence, the appropriateness of the first-order weighted estimators was evaluated 
and compared to results based on second-order weights. 

More than 187,000 students from 54 jurisdictions were assessed in the NAEP 2003 
fourth-grade reading assessment. Jurisdictions included states, the District of Columbia, U.S. 
territories, and Department of Defense schools. The sampling design is described briefly as 
follows: Schools were stratified with one stratum per state for public schools and several region- 
based strata for private schools. Within each stratum, schools were selected using a stratified 
systematic probability proportional to size design so as to oversample minority, nonpublic, and 
relatively large schools. This step was followed by a simple random sample of students drawn 
from each school. The average school sample size for the national sample was 23; the estimated 
average school population size was 56. First-order weights for both stages of the sample design 
were available from the restricted use data file. 

We fitted a one-way random effects model to the NAEP national data, using one of the 
plausible values (Mislevy, 1991) for the assessment score as the response variable. Estimation of 
the model was conducted twice: once computing first-order weighted estimators as given in 
Equations 8 through 10 and once computing second-order weighted estimators as specified in 
Kom and Graubard (2003). Because second-order weights were not provided on the NAEP file, 
they had to be inferred from the first-order weights and from knowledge about the sample 
design. As all the details about the school level design were not known, the simplifying 
assumption was made that the selection of schools was independent; that is, n {j = . At the 

student level, we calculated second-order selection probabilities for students from school i as 
n sAi = m i (m i -l)/M j (M j -1) , as it would be for SRS within school. Based on this analysis, the 
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ICC was estimated by the second-order weighted estimators to be around 0.24. Both Figure 4 
and Equation 1 1 suggested that bias of the first-order weighted estimators of variance 
components would not likely be a problem for this combination of ICC and sample size. 

In addition, the one-way random effects models were fitted using both first-order and 
second-order weighted estimation methods to data from two jurisdictions. The jurisdictions were 
chosen to exemplify different kinds of weight structures. All the schools for Jurisdiction 1 were 
selected so the design was noninformative. The sample consisted of 24 schools with an average 
school sample size of 30. The estimated average school population size was 64, and the ICC 
value was estimated at around 0.08 from the second-order weighted estimators. Jurisdiction 2 
had a design for which several extreme-performing schools (those with high and low 
performance) had large weights. The sample consisted of about 120 schools. The average school 
sample size was 16; the estimated average school population size was 32. The ICC for reading 
assessment score was estimated to be 0.34 based on the second-order weighted estimators. 
Equation 1 1 suggested that bias of estimators of the within-school variance component was not 
likely to be a problem for either jurisdiction. Figure 4 suggested that the first-order weighted 
estimator of the between-school variance for Jurisdiction 2 was also likely to have acceptable 
bias, but that we should be cautious when using it for Jurisdiction 1 due to the small value of 
ICC, even for the design’s relatively large school sample size. 

Table 3 shows the estimates of variance components as well as ICC calculated using first- 
and second-order weights for the national data and the two jurisdictions. In parentheses below 
each first-order weighted estimator is the estimated relative bias, calculated as the difference 
between the first- and second-order weighted estimators divided by the value of the second-order 
weighted estimators. This assessment of the actual bias of the first-order weighted estimator is 
reasonable if our approximated second-order weights are accurate. The results show, as 
expected, that the estimated relative bias was negative for all estimates of within-school variance 
and positive for estimates of between-school variances. The estimated relative biases were less 
than 10% for all variance component estimators except the between-school component for 
Jurisdiction 1. This result was predicted due to the small ICC value in that jurisdiction. However, 
in cases like Jurisdiction 1, where less than 10% of total variance contributes to the differences 
among schools before introducing any regression models, multilevel modeling might not be 



17 




necessary. This study shows that the analytic expressions can accurately predict which estimators 
will perform better based on our knowledge of the design and population characteristics. 



Table 3 



First- and Second-Order Weighted Estimators of Variance Components and Intraclass 
Correlations Coefficients (ICC) for 2003 National Assessment of Educational Progress 
(NAEP) Fourth-Grade Reading Assessment Data 



Estimators using... 


Estimates of 

O) 


Estimates of 


Estimates of 
ICC 




NAEP national data 


First-order weights 


1026.5 


355.9 


0.26 




(-2.3%) 


(7.2%) 


(8.3%) 


Second-order weights 


1050.6 


331.9 


0.24 


NAEP Jurisdiction 1 data 


First-order weights 


1616.3 


175.1 


0.10 




(-1.7%) 


(19.6%) 


(25%) 


Second-order weights 


1644.8 


146.4 


0.08 


NAEP Jurisdiction 2 data 


First-order weights 


1111.8 


573.9 


0.34 




(-2.8%) 


(4.7%) 


(3.0%) 


Second-order weights 


1144.4 


571.2 


0.33 



Note: The estimated relative bias, calculated as the difference between the first- and second- 
order weighted estimators divided by the second-order weighted estimators, is in parentheses. 



6. Weight Scaling 

It was noted that the first-order weighted estimators of the variance components were 
biased regardless of whether the sampling design was informative. One approach to reduce the 
bias of the first-order weighted variance component estimators was to scale the weights. Recent 
statistical literature provided several scaling methods (Asparouhov, 2006; Korn & Graubard, 
2003; Pfeffermann et al., 1998; Rabe-Hesketh & Skrondal, 2006; and Stapleton, 2002). 
Pfeffermann et al. proposed two scaling procedures that only scaled the student within-school 
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conditional weight ( w sU ). To be more specific, the scaled student conditional weight under their 
Scaling Method 1 was 



(i) 



Z m, 

5=1 



, =l W vl/ 






(16) 






and the sum of w® over s was equal to the effective sample size 






Z rrij 

s = 1 






Under Pfeffermann’s Scaling Method 2, the scaled student conditional weight was 



( 2 ) 

w\. = w. 



m. 



v s\i 



s\i 



z:, 






(17) 



For this method, the sum of w™ over 5 was equal to the sample size for school i . 

For designs that were SRS at the student level, Pfeffermann’s Scaling Method 2 was 
appropriate to produce an approximately unbiased estimator of the within-school variance. For 
such designs, the scaled student conditional weight in Equation 17 was equal to 



w'f = 



Z m i 



m, 



m : 



Z m i 

5=1 



= 1 . 



= , W 5l« 



and the scaled first-order weighted (SFW) estimator ( s ] SFW ) reduced to the unweighted one (with 
weight of 1), which was approximately unbiased, so that 

( 18 ) 

However, the SFW estimator ( s 2 aSFW ) of the between-school variance was still biased. For 
the same sampling design assumed before with constant M /s and m [ ' s , when scaled weights 
were used, 
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RB ,.a,e( S lsFW ) 



l~E{w,) 
( K - 1 )m 



1 -ICC 



ICC 



-p(7T ij a i a j ,z i z j )sd(7r ij w i w j ) 



+ 



1 -E(w,) p(w i ,af)sd(w i ) 



K - 1 



(^- 1 ) 



( 19 ) 



where p( ) -E( ) , and sd ( ) were all taken with respect to a . Note that Equation 19 was 
approximately zero for large K while the first two moments of w ; . were finite or if a large 
fraction of schools was selected. 

To examine the accuracy of the bias expressions for the SFW estimators, the simulation 
study in section 3.2 was revisited. The scaled weighted estimators were calculated for each 
simulated sample, averaged over 5,000 replications to obtain the relative biases, and compared 
with values computed from Equations 18 and 19. Table 4 shows that the simulated and 
calculated relative biases were similar for all parameters in all four scenarios. Thus the SFW 
estimators of within-school variance were approximately unbiased and those of between-school 
variance were negatively biased. The relative bias of s 2 aSFW was trivial for k ~ 750 (Condition B x ) 
and increased a bit for k = 99 (Condition B 2 ). Compared to the first-order weighted estimators 

whose relative biases are shown in Table 3 for the same sample designs, those of the SFW 
estimators were much smaller. 

In summary, scaling of the first-order weighted estimator using Scaling Method 2 
(Pfeffermann et al., 1998) eliminated most of the bias from estimators of the variance 
components for designs that were SRS at the student level, along with a large number of schools 
in the population or a large fraction of schools being selected. 



7. Summary and Discussion 

The analytic bias expressions derived in this paper are based on one-way random effects 
models and ANOVA estimators. Such models commonly serve as the preliminary step in the 
hierarchical model fitting in providing information about the outcome variability at each of level 
of the model (Raudenbush & Bryk, 2002). 

The research results suggest that incorporating first-order weights can help to reduce bias 
due to the informativeness of sampling designs. However, large relative bias still exists when 
both school sample size and ICC values are small, regardless of the design informativeness. The 
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Table 4 



Comparison of Simulated and Approximate Relative Bias (RB) of the Scaled First-Order 
Weighted Estimators From a One-Way Random Effects Model With Informative Designs 
at Level 2 







A1 (asymmetric strata) 


A2 (symmetric strata) 


RB ( S eSFW ) 


rn&rw) 


RB ( S eSFW ) 


R B(Nsfw) 




Ci (m - 


= 23) 




B1 


Simulated 


0.02% 


-0.03% 


0.00% 


0.01% 




Analytic 


0.00% 


-0.07% 


0.00% 


0.02% 


B2 


Simulated 


-0.03% 


-6.35% 


0.01% 


-0.67% 




Analytic 


0.00% 


-5.57% 


0.00% 


-1.52% 








C 2 (m 


= 5) 




B1 


Simulated 


0.00% 


-0.23% 


0.00% 


0.09% 




Analytic 


0.00% 


-0.08% 


0.00% 


-0.03% 


B2 


Simulated 


-0.26% 


-6.92% 


-0.31% 


-2.90% 




Analytic 


0.00% 


-7.15% 


0.00% 


-3.10% 



Note. Simulation results are based on 5,000 iterations. Analytic results were calculated from 
Equations 18 and 19. 



study also found that with small sample sizes (less than 20) and small ICC values (less than 0.2), 
if the weights are relatively constant at both student and school levels, then the unweighted 
estimators of variance components will be less biased than the first-order weighted estimator. On 
the other hand, if the weights vary at either level, then the second-order weighted estimators are 
needed for estimating variance components. This difference presents a dilemma for data users as 
second-order weights typically do not exist in the database, and constructing those weights 
accurately requires a level of knowledge about the design that is not likely to be available either, 
not to mention the unavailability of commercial software to compute these second-order 
weighted estimators. In that case, scaled first-order weighted estimators that were discussed in 
section 6 provide an alternative to the difficult-to-use second-order weighted estimators for 
designs in which SRS is used at the student level, given a large number of schools in the 
population or a large fraction of schools being selected. But until some method of making the 



21 




second-order weights available to users is implemented in publicly available software programs, 
an adequate and unique solution does not appear to be available. 

As a limitation of the analytic approach, the obtained bias expressions only apply to the 
sampling designs described in this study. The bias expressions will become much more difficult 
to tackle if the SRS assumption at the student level is violated. Simulation studies might be a 
practical approach for future study of various sampling schemes at lower levels of hierarchical 
models. 
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Appendix 

Bias Expression of First-Order Weighted Estimators 

Bias Expression of the First-Order Weighted Estimator of the Within-School Variance 

The first-order weighted ANOVA estimator of the within-school variance is given as 



sse,_ 






(Al) 



with 



sse 



FW 



Z r V' t 2 r v ' j —2 

/= 1 r , W 'L^ iWJis - Zu i= 1 7 » W / 2^=1 ^uVi.FW 



(A2) 



where / and I sU are indicator functions with 



[l if unit i is in the sample 
[0 if unit i is not in the sample 



I 



sli 



if unit 5 within i is in the sample, given that unit i is in the sample 
Otherwise 



and 



yi.FW 






Z M i r 



s\i 



The expectations of / and 7 sli with respect to the sampling design are 

E P (/) = ^,= |/ w t and E p (/,, ) = x sU =1/ w sli . 

We first take the expectation of each quantity on the right side of Equation Al with respect to the 
design, then to the model 

% (0) = (8) = (») (A3) 
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Given SRS at Level 1, the student selection probability is independent of the student level 
random effect e is , and with the property of 






m : 



M. 



(A4) 



Given the designs, Expression A3 can be further simplified as 

E& Ec u E pI \gj E pU ^ n (*) = E^i E^ u E pn ^ E pll (*)■ 



Therefore, 



=%%-[Z',Z";(A+«,+^f 



vt 



= E 



Si 






111 

- + a: + a: + 



2 Jua i )M i 



(A5) 



and 



% z:,^z:'. 



—2 



i.FW 






i.FW 



= E 



Si 



y k .7T i W i 

4—li = 1 ' ' 



//-M, +afM i + 



E m i 2 

S ^W\, 



= E 



Si 



X, i M 2+a f + — + 2 P a i 

l m. 



AT.. 



AT.. 



of + 2/ua i M i 



(A6) 



As a result, 



Ec p (sse FW ) —E^j 



= E 



Si 



Zi (a 2 + «, 2 + y + 2//«, ) m , - x,!i 

^ M ( . [m i — 1)^ 



// CLj ~\ (J e + 

m. 



M. 
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/» 
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m. 



y 



(A7) 
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Meanwhile, 



J=1 Is\i w s\i 1 EgiEpngiEg n E p n 



^ s -l^s\i^s\i ^ 



The right side of Expression A7 can be written as 



Egi E pI \g! Eg n E pII 



zi^-tz 



M i J t 

s=i s\i^s\i ^ 



= E sl E pllll (Y,'‘j,w l (M i - 1 



Equations A6 and A8 together yield 



Ep p ( S eFW ) 



y^ K f Mj (m i l) 









k | m i —M j 

»*, , 



(All) 



Bias Expression of the First-Order Weighted Estimator of the Between- School Variance 

The first-order weighted ANOVA estimator of the between-school variance is given as 



S aFW ,-^K 



(Z/=l I i W l - V > m 0 FW m 0 
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(A 14) 
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(A 16) 
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% = Zm M > E V ( w i a *) + 'L* 1 M i M J E $i (WW;) 



we have 
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On the other hand, the expectation of Equation A15 is 
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Combining Equations A 10, A 17, and A 18, the delta method gives 



(A18) 
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